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Introduction 


Consider  the  situation  when  a  number  of  physically  distinct  computation 
units  work  on  a  common  problem,  while  their  operation  is  coordinated  via 
communication  channels  connecting  some  of  these  units.  Each  computation 
unit  has  certain  processing  and  memory  capability  and  is  preprogrammed  to 
perform  its  part  of  the  computation,  as  well  as  to  receive  and  send  control 
messages  over  the  communication  channels.  The  program  residing  in  each 
node  will  be  referred  to  as  the  node  algorithm  and  the  ensemble  of  all 
algorithms  providing  the  solution  to  the  common  problem  is  named  a  distribu¬ 
ted  protocol. 

For  the  purpose  of  this  paper  it  is  convenient  to  regard  the  computation 
units  as  nodes  in  a  network  whose  links  are  the  connecting  communication 
channels.  The  specific  protocols  considered  here  will  be  collectively 
called  Distributed  Network  Protocols  (DNP)  to  indicate  the  fact  that  the 
common  problem  that  has  to  be  solved  is  connected  with  the  network  topology. 
Many  of  the  "classical"  graph  algorithms  have  their  distributed  version, 
and,  in  addition,  several  new  distributed  network  protocols  appear  from 
practical  problems.  The  main  application  considered  so  far  for  DNP's  is 
in  data  or  voice  communication  networks.  In  such  networks,  geographically 
dispersed  devices  must  transmit  information  to  one  another  and  must  somehow 
coordinate  this  transmission.  With  the  advances  of  mini  and  micro-computers, 
it  is  certainly  feasible  that  nodes  will  have  their  own  processing  and  memory 
unit  and  will  serve  as  communication  processors  and/or  as  switches.  In 
principle,  the  common  goal  of  all  these  units  is  to  efficiently  transmit 
the  required  information  to  achieve  certain  performance  goals,  like  minimum 
delay  or  maximum  throughput.  With  this  application  in  mind,  several  examples 
of  problems  for  which  DNP's  have  been  proposed  or  are  currently  under  investi¬ 
gation  are  routing  of  information,  shortest  path,  minimum  weight  spanning  tree, 
common  channel  random  access  coordination  and  others. 


The  mein  purpose  of  the  present  paper  is  to  give  a  formal  description  and 
rigorous  validation  to  a  nuaber  of  DNP's,  some  of  which  have  been  presented 
previously  in  an  intuitive  way  and  soae  of  which  are  proposed  here  for  the 
first  tine.  He  eainly  consider  DNP's  for  the  purpose  of  connectivity  tests, 
shortest  path  in  terns  of  nuaber  of  links  and  routing-path  updating.  In 
addition,  we  give  a  unifying  approach  to  the  validation  of  the  protocols  by 
presenting  several  basic  simple  DNP's  that  provide  building  blocks  to  the 
presented  protocols. 

The  presented  protocols  have  one  additional  important  feature.  Since  nodes 
and  links  may  fail  and  be  added  asynchronously  to  the  network,  the  protocols 
aust  be  able  to  work  under  arbitrarily  changing  network  topology.  Although 
we  first  consider  DNP’s  for  networks  with  fixed  topology,  in  Sec.  7  we  extend 
those  protocols  to  incorporate  cases  of  changing  topology. 


The  General  Model 


In  this  section  we  give  the  general  model  and  assumptions  used  in  all 
presented  DNP's.  Consider  a  network  (V,E)  where  V  is  a  set  of  nodes 
and  EC  V  x  V  is  a  set  of  links.  For  the  first  part  of  this  paper,  we 
assume  that  the  network  has  fixed  topology.  We  shall  use  the  following 
assumptions : 

a)  Each  link  is  bidirectional;  the  link  connecting  the  node  i  with 
node  j  considered  in  the  direction  from  i  to -j  is  denoted  (i,j). 

b)  All  messages  referred  to  in  this  paper  are  control  messages. 

c)  On  each  link  in  each  direction  there  is  a  link  protocol  that  insures 
that  each  message  sent  by  node  i  say  on  link  (i,j)  will  arrive 
correctly  within  finite  nonzero  undetermined  time  and  all  messages 
are  received  at  node  j  in  the  same  order  as  they  were  sent  by  i 
(observe  that  we  do  not  preclude  channel  errors,  provided  that  there 
exists  a  proper  detection/retransmission  or  correction  algorithm  on 
each  link). 

d)  All  messages  received  at  a  node  i  are  stamped  with  the  identification 
of  the  link  from  which  they  came  and  then  are  transferred  into  a 
common  queue;  each  node  uses  one  processor  for  the  purpose  of  the 
algorithm;  the  processor  extracts  the  control  message  at  the  head 
of  the  queue,  proceeds  to  process  it  and  discards  the  message  when 
processing  is  completed;  no  other  operation  related  to  the  protocol 
is  performed  by  the  processor  while  a  message  is  being  processed; 
consequently  we  may  assume  that  the  processing  of  each  message  takes 
zero  time. 

e)  Each  node  has  an  identification;  before  the  protocol  starts,  each 
node  knows  the  identity  of  all  nodes  that  are  potentially  in  the  net¬ 
work;  it  knows  nothing  about  the  topology  of  the  network  and  in  parti¬ 
cular  about  what  nodes  actually  belong  to  the  network.  We  denote  by 
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1,2,...,|V|  the  nodes  that  are  potentially  in  the  network  and,  when 
needed,  by  l,2,...,|v|  the  nodes  actually  belonging  to  the  netowrk. 

f)  Each  node  knows  its  adjacent  links,  but  not  necessarily  the  identity 
of  its  neighbors,  i.e.  the  nodes  at  the  other  end  of  the  links;  -how¬ 
ever,  in  our  algorithms  it  will  be  convenient  to  use  expressions  like: 
"send  messages  to  all  neighbors",  meaning  "send  messages  over  all 
adjacent  links".  The  collection  of  all  neighbors  of  node  i  will  be 
denoted  by  . 

g)  Unless  otherwise  stated,  the  protocol  can  be  started  by  any  node  or 
by  several  nodes  asynchronously;  a  node  starts  the  algorithm  by 
receiving  a  special  message  "START"  from  the  outside  world;  a  standing 
assumption  is  that,  once  a  node  has  entered  the  algorithm,  it  cannot . 
receive  "START'. 


3.  Basic  Protocols 
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The  two  basic  DNP's  presented  in  this  section  provide  a  way  for  broad¬ 
casting  information  in  the  network. 

3.1  Propagation  of  Information  (PI) 

Suppose  that  node  j  receives  from  the  outside  world  a  piece  of  information 
that  has  to  be  transmitted  to  all  nodes  in  the  network.  The  simplest 
procedure  to  accomplish  this  is  for  node  i  to  transmit  a  message  containing 
this  information  to  all  its  neighbors  and  for  each  other  node  k  in  the  net¬ 
work,  when  it  receives  the  first  such  message, to  send  a  similar  message  to 
its  own  neighbors.  All  other  messages  received  at  k  are  disregarded.  We 
shall  now  formally  present  the  algorithm  for  eabh  node  and  validate  the 
protocol. 

PROTOCOL  PI 

Variables  of  the  algorithm  at  node  i 

nr  shows  if  node  i  has  already  entered  the  algorithm  (values  0,1). 

Messages  sent  and  received  by  the  algorithm  at  node  i 

MSG  -  message  sent  by  node  i  ; 

MSG(l)  -  message  received  from  neighbor  l  ; 

START  -  message  received  from  the  outside  world  ; 

It  is  assumed  that  each  message  carries  the  piece  of  information  that  has 
to  be  propagated. 

Algorithm  tor  node  i 

Assumption:  just  before  entering  algorithm,  node  i  has  m^  ■  0. 

1.  For  START1  or  MSG(l) 

if  n^  ■  0  ,  then:  nr  ♦  1  ;  send  messages  to  all  neighbors. 


2. 


Theorem  PI-1 


Suppose  a  node  j  receives  START.  Then: 

a)  All  nodes  i  connected  to  j  (i.e.  that  are  in  the  connected  network 
containing  j)  will  set  m^  ♦  1  in  finite  time. 

b)  During  the  execution  of  the  protocol,  exactly  one  MSG  is  being  sent 
on  each  link  in  each  direction. 

c)  The  propagation  of  information  is  the  fastest  possible  in  the  follow¬ 
ing  sense:  for  a  node  i, let  p.  be  the  node  from  which  node  i  receives 
the  first  MSG  (see  line  <2>  of  the  Algorithm  ).  For  a  link  (i,t)  let 
the  weight  w  of  that  link  be  the  time  it  took  for  MSG  to  travel  from  i 
to  l,  i.e.  from  the  time  i  sends  MSG  on  (i,£)  until  the  time  the  pro¬ 
cessor  at  1  starts  operating  on  the  MSG  (this  includes  propagation  and 
queueing  time).  Then  the  collection  of  links  ((p^i),  for  all  i  in 
the  network)  forms  the  tree  of  shortest  distances  from  j  to  all  nodes. 

Proof 

The  proof  of  all  properties  is  straightforward  and  we  give  here  only 
an  outline.  Property  a)  follows  by  induction  on  the  distance  (in 
terms  of  numbers  of  links)  from  node  j.  Suppose  all  nodes  i  that  are 
at  distance  r  from  j  perform  d^  ♦  1.  Then  a  node  k  at  distance  (r  ♦  1) 
is  a  neighbor  of  a  node  at  distance  r  and  when  receiving  MSG  from  it, 
either  this  is  the  first  message  at  k, in  which  case  dfc  becomes  1  or  it 
is  not,  in  which  case  d^  is  1  already.  Property  b)  follows  from  the 
fact  that  for  all  nodes  i,  the  parameter  becomes  1  exactly  once,  at 
which  time  node  i  sends  MSG  on  all  adjacent  links.  Property  c)  holds 
because  if  there  was  a  shorter  route  from  i  to  j,  node  i  Would  have 
received  MSG  on  that  route  before  receiving  MSG  from  pi- 
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Before  proceeding  to  the  second  algorithm,  we  may  note  that  the  protocol 
will  work  correctly  even  if  "START"  is  delivered  to  several  nodes  at 
arbitrary  times,  provided  that  each  of  these  nodes  has  not  entered  the 
algorithm  before  receiving  "START".  Properties  a)  and  b)  still  hold  and 
the  propagation  is  still  the  fastest  possible. 

.2  Propagation  of  Information  with  Feedback  (PIF) 

Sometimes  a  node  s  that  receives  START  and  propagates  information  may  want 
to  be  positively  informed  when  the  information  has  indeed  reached  all  con¬ 
nected  nodes.  Here  of  course  the  assumption  is  that  only  one  node  can  receive 
START.  The  following  protocol  can  be  used  for  this  purpose.  When  receiving 
START,  node  s  sends  MSGs  to  all  neighbors3.  When  receiving  any  MSGs,  an  arbi¬ 
trary  node  i  marks  the  link  from  which  it  was  received.  When  receiving  the 
first  MSGs  from  neighbor  1  say,  a  node  i  denotes  this  neighbor  with  a  special 
mark  p5,  and  sends  MSGs  to  all  neighbors  except  to  p5.  When  it  observes  that 
it  has  received  MSG  from  all  neighbors,  a  node  i  other  than  s  sends  MSGs  to 
p5.  It  is  shown  below  that  receipt  of  MSG5  from  all  neighbors  at  node  s  can 
be  interpreted  as  the  signal  thht  the  information  has  indeed  reached  all 
connected  nodes.  In  this  way,  the  propagation  of  MSG's  occurs  in  two  waves: 

(i)  from  node  s  into  the  network  for  purposes  of  propagating  information,  and 

(ii)  from  the  network  back  to  node  s  for  the  purpose  of  acknowledgment. 

The  formal  description  of  the  protocol  follows. 

PROTOCOL  PIF 

The  algorithm  for  node  s  that  receives  START  is  different  from  the  algorithm 
for  all  other  nodes.  We  shall  first  give  the  algorithm  for  an  arbitrary 
node  i  other  than  s  and  then  for  node  s. 

Variables  of  the  algorithm  at  node  i  j  s 

m5  shows  if  node  i  is  currently  participating  in  the  protocol  (values  0,1); 
N5(i)  marks  receipt  of  MSG5  from  neighbor  l  (values  0,1),  ic  G 
p5  -  neighbor  from  which  MSGs  was  received  first. 


MSG*  and  MSGs(t)  with  the  sane  meaning  as  MSG  in  PI. 

Algorithm  for  node  i  ^  s 

Assumption:  just  before  entering  algorithm,  node  i  has  m*  •  0, 

p?  *  nil,  N?(t)  *  0  for  all  t  e  G. . 
ri  1  l 

1.  For  MSG*(i) 

2.  N*(i)  1; 

3.  if  m*  ■  0,  then:  m*  ♦  1;  p*  *■  1;  send  MSG*  to  all 

neighbors  except  p*. 

4.  if  it •  «■  Gi  holds  N*(i')  -  1,  then:  send  MSGs  to  p*; 

m*  0;  Vi'  c  Gi,  set  N*(£')  «-  0. 

Algorithm  for  node  s 

For  node  s,  the  variables  are  n*,  N*(i')  for  all  1'  e  G  ,  the  messages 

9  9  3 

are  MSG*(i)  and  START  and  the  algorithm  is: 

3.  For  START 

3a.  m*  ♦  1;  send  MSG*  to  all  neighbors. 

For  MSG*(P) 

N*(t)  «•  1; 

4.  if  Tt'  e  G  ,  holds  N*(t')  ■  1,  then:  m*  ♦  0; 

5  3  5 

it'  c  G  ,  set  N*(i' )  «-  0. 

3  5 


Note  The  lines  in  the  algorithm  for  s  have  been  numbered  to  denote 
similar  operations  as  in  the  algorithm  for  an  arbitrary  node  i. 


In  order  to  analyse  the  protocol,  we  shall  need  the  following 
notations: 

<*>i  -  the  event  of  node  i  performing  line  <•>  of  its  algorithm; 
whenever  the  corresponding  line  contains  an  i£  operation, 
the  notation  refers  only  to  the  cases  when  the  condition 
indeed  holds. 

t(*J  -  time  when  event  *  happens. 

Theorem  PIF-1 

Suppose  node  s  receives  START.  Then 

a)  all  connected  nodes  i  will  perform  the  event  <3>^  in  finite  time 
and  exactly  once;  after  this  happens,,  the  links 

{ Ci ,  p*)  for  all  connected  i  } 

will  form  a  directed  tree  rooted  at  j ;  in  addition,  for  all  i 

t(<3>.)  >  t(<3>  )  (3.1) 

Pi 

b)  node  j  and  all  connected  nodes  i  will  perform  <4>  in  finite  time 
and  exactly  once;  moreoever 

t(<3>.)  <  (<4>.)  <  t(<4>  );  (3.2) 

Pi 

also,  when  node  s  performs  <4>,  all  connected  nodes  will  have 
completed  the  algorithm,  i.e.  performed  <4>. 

c)  exactly  one  MSG  travels  on  each  link  in  each  direction. 

Proof 

a)  and  c)  follow  from  Theorem  PI-1.  To  prove  b)  let  k  be  a  leaf  of 
the  tree  referred  to  in  a),  i.e.  tjt  such  that  p*  «  k.  Then  all 
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neighbors  m  of  k  will  send  MSG  to  k  whenever  they  perforn  <3> 

Node  k  will  receive  all  these  messages  and  will  be  able  to  perform 
<4>k.  At  that  time  it  will  send  MSG  to  pj.  The  same  will  be  true 
for  all  leaves.  Now  nodes  that  are  on  the  last-but-one  level  in 
the  tree  will  be  able  to  perform  <4>  and  the  procedure  will  continue 
downtree  all  the  way  to  node  s.  This  argument  clearly  proves  (3.2) 
and  completes  the  proof  of  the  Theorem. 
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4.  Connectivity  Test  Protocols 

The  purpose  of  this  class  of  DNP's  is  to  allow  each  node  to  learn  what 
nodes  are  connected  to  it. 

Protocol  CT1 

The  idea  here  is  to  use  protocol  PI,  first  to  inform  all  nodes  that  the 
protocol  is  in  progress  and  then  for  each  node  to  propagate  its  own 
identity.  Every  node  (or  several  nodes)  can  start  the  protocol  by  receiving 
START.  A  node  enters  the  protocol  whenever  it  receives  either  START  or  the 
first  control  message  from  any  of  its  neighbors.  The  first  action  taken  by 
a  node  when  entering  the  protocol  is  to  send  a  control  message  containing 
its  own  identity  to  all  its  neighbors,  thereby  starting  propagation  of  this 
identity.  In  addition,  whenever  a  node  i  receives  the  first  control  message 
with  the  identity  of  some  other  node  j ,  it  marks  j  as  connected  and  sends  a 
message  MSG^  with  the  identity  of  j  to  all  neighbors.  All  further  messages 
with  the  identity  of  j  are  discarded  with  no  action  taken. 

Variables  of  the  algorithm  at  node  i 

Kb  -  shows  if  i  has  already  entered  the  algorithm  (values  NORMAL,  WORK); 

d?  -  shows  if  i  knows  whether  j  is  connected  (values  0,1), 
for  j  =  1,2,. . . |v|,j  i  i  . 

Messages  sent  and  received  by  the  algorithm  at  node  i 
MSG^  -  control  messages  with  identity  j  sent  by  i  ; 

MSG^ (l)  -  message  with  identity  j  received  by  i  from  i  ; 

START  -  same  meaning  as  in  PI. 

Algorithm  for  node  i 

Assumption:  just  before  entering  protocol,  holds  d?  •  0  for  all  j. 
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1.  For  START  or  MSG^ (1) 

2.  if  M.  -  NORMAL,  then:  Mj  WORK;  send  MSG1  to  all  neighbors. 

»  •  • 

3.  if  d:  ■  0,  then:  d^  ♦  1;  send  MSGJ  to  all  neighbors. 

Theorem  CT1-1 

If  node  j  is  connected  to  i  and  START  is  delivered  to  any  node  connected 
to  j  (or  to  j  itself),  then  d^  will  become  1  in  finite  time  and  it  i  and  j 

J  i 

belong  to  disconnected  networks,  then  will  remain  0  forever. 

Proof 

The  event  WORK  propagates  as  in  PI  and  hence  will  happen  in  finite 
time  at  all  nodes  k  connected  to  the  node  that  received  START.  For  a 
given  i,  after  Mj^  becomes  WORK,  the  event  d1  «*  1  propagates  again  as  in 
PI  and  hence  will  happen  in  finite  time  at  node  j.  The  second  part  of 
the  Theorem  is  obvious. 

Theorem  CT1-2 

With  protocol  CT1,  there  is  no  way  for  node  j  to  know  for  sure  what  nodes 
are  disconnected  from  it  or  in  other  words,  there  is  no  way  for  j  to  know 
when  the  algorithm  is  completed,  except  for  the  case  when  all  nodes  are 
connected . 

Proof 

Consider  first  the  case  of  three  nodes  1,  2,  3  with  links  (1,2)  and  (2,3). 

If  1  starts  the  protocol,  it  will  receive  the  same  sequence  of  messages 
whether  (2,3)  is  working  or  not,  except  that  if  it  does,  it  will  later 
receive  the  identity  of  3.  Now,  after  receiving  the  identity  of  node  2 
and  before  receiving  the  identity  of  3,  there  is  no  way  for  node  1  to  posi¬ 
tively  know  whether  it  has  already  completed  the  protocol  or  not,  i.e.  whether 
new  identities  are  supposed  to  still  arrive.  It  is  easy  to  see  that  similar 
situations  may  arise  for  any  other  topology. 
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Communication  cost 

The  number  of  bits  transmitted  on  each  link  in  each  direction  is 
I V |  log2  |V|.  This  is  because  every  identity  travels  exactly  once  on 
each  link  in  each  direction,  there  are  |v|  identities  and  it  takes  log2  |v| 
bits  to  describe  an  identity.  The  total  number  of  bits  in  the  network  is 
2|E|  | V [  log2  |V|,  where  E  is  the  number  of  bidirectional  links. 

The  rest  of  this  section  is  devoted  to  the  presentation  of  several  protocols 
that  solve  the  problem  raised  in  Theorem  CT1-2,  namely  allow  nodes  to  posi¬ 
tively  know  that  the  protocol  has  indeed  been  completed.  He  shall. say  then  that 

the  protocol  has  the  termination  property.  Protocol  CT2  achieves  the  pro¬ 

perty  by  employing  the  basic  protocol  PIF,  while  the  others  use  a  different 
idea. 

Protocol  CT2 

The  protocol  is  started  and  entered  by  nodes  in  the  same  way  as  in  CT1. 

Whenever  a  node  i  receives  the  first  message  MSG^  with  the  identity  of  j, 
from  neighbor  1  say,  a  node  i  denotes  this  neighbor  (as  in  PIF)  with  a 
special  mark  p?,  and  sends  MSG^  to  all  neighbors,  except  to  p|.  When  it 
observes  that  it  has  received  MSG^  (for  j  t  i)  from  all  neighbors,  node  i 
sends  MSG^  to  p?.  The  termination  property  holds  because  it  is  shown 
below  that  receipt  of  MSG1  from  all  neighbors  can  be  interpreted  as  the 
signal  that  node  i  positively  knows  the  nodes  that  are  connected  to  it  and 
also  the  nodes  that  are  disconnected. 

Variables  of  the  algorithm  at  node  i 

Nr  ■  WORK  while  i  is  participating  in  the  protocol  and  ■  NORMAL  after 
completing  the  protocol; 

d?  shows  if  i  knows  whether  j  is  connected  (values  0,1}  for  all  j; 

n|(1)  shows  if  MSG^  has  been  received  already  from  neighbor  1  (values  0,1) 
for  all  j  and  l  c  ; 

p|  -  neighbor  from  which  MSG^  has  been  received  first,  for  all  j. 
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Messages  received  and  sent  by  the  algorithm  at  node  i 
Sane  as  in  CT1. 

Algorithm  for  node  i 

Assumption:  just  before  node  i  enters  the  algorithm,  it  has 
*  0,  n|(A)  ■  0  for  all  j  and  t  e  G^. 

1.  For  START  or  MSG^ (£) ,  j  f  i 

la.  if  MSG,  then:  N^fi)  *•  1. 

2.  ij[  d^  «  0,  then:  M^  ♦  WORK;  d1  *■  1;  send  MSG*'  to  all  neighbors. 

3.  if  MSG  and  d|  ■  0,  then:  d|  1;  p|  ♦  l;  send  MSG^  to  all 

neighbors,  except  p| . 

4.  if  it'  c  Gj  holds  N^(t)  •  1,  then:  send  MSG^  to  p^; 

ft'  c  Gj,  set  Nj(l’)  ♦  0. 

5.  For  MSG1 (A) 

5a.  n[(1)  1; 

6.  if  it'  e  Gi(  holds  N^A')  -  1,  then:  Mi  ♦  NORMAL;  tv  e  Gi# 

set  N*(A')  «■  0. 

In  order  to  analyse  the  protocol,  we  shall  need  the  following  notation 
(see  also  notations  just  before  Theorem  PIF-1): 

<*>^  -  the  event  of  node  i  performing  line  <*>^  of  its  algorithm 
regarding  node  j  (i.e.  reacting  to  receipt  of  MSG^). 

The  properties  of  the  algorithm  are  given  in  the  following: 

Theorem  CT2-1 

Suppose  START  is  delivered  to  any  node  connected  to  a  given  node  j  (or 
to  j  itself).  Then  '• 
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a)  sane  as  Theorem  CT1-1 

b)  node  j  will  perform  <6>.  in  finite  time  and  exactly  once,  and 

J  v 

when  this  happens,  it  will  have  d.  *  1  for  all  connected  nodes 
k  J 

k  and  d^  ■  0  for  all  disconnected  nodes  k.  In  other  words,  it 
will  positively  know  at  that  time  what  nodes  are  connected,  resolv 
ing  the  problem  raised  in  theorem  CT1-2. 


Proof 

The  event  WORK  propagates  as  in  PI  and  hence  will  happen  in  finite 
time  at  all  nodes  k  connected  to  the  node  that  received  START.  For  a 
given  node  i,  after  d!  becomes  1,  the  event  d*  1  (i.e.  <3>*  in  the 
present  protocol)  propagates  in  the  same  way  as  <3>fc  in  PIF  and  hence 
(cf.  Thm.  PIF-1)  it  will  happen  in  finite  time  at  node  j,  completing  the 
proof  of  a).  Similarly,  for  the  given  node  i,  <4>£  propagates  in  the 
same  way  as  «4>fc  in  PIF  and  hence  «4>^,  <4>?  and  <6>^  will  happen  in  finite 
time,  each  exactly  once.  It  remains  to  show  that  <6>^  is  indeed  the  signal 
indicating  that  node  j  knows  all  connected  nodes,  namely  to  show  that 

t(<3>*)  <  t(<6>.)  (4.1) 

for  all  nodes  k  connected  to  j.  For  given  k  and  j,  consider  the  nodes 
k  ■  iQ,  ij,...,ir  •  j,  where  i^  "  Pit  ^or  1  "  °»  •••*  r_l*  *n  words, 

this  is  the  branch  of  the  tree  rooted  at  j  referred  to  in  Thm.  PIF-1  on 
which  node  k  sits.  We  wish  to  prove  (4.1)  by  using  induction  on  the  men¬ 
tioned  series  of  nodes,  namely  we  want  to  prove  by  induction  that 

t(<3>J)  <  t (<4>|)  for  i  -  k,  ij. ••••!„_!•  3-  (4.2) 

Observe  that  <3>£  is  not  defined  in  the  algorithm  and  is  used  here  for  con- 
venience  of  notation  to  mean  <2>^  and  similarly,  <4>'  means  <6>^. 

y 

Since  <2>^  =  <3>^  is  the  first  operation  at  node  i,  expression  (4.2)  is 
clearly  true  for  i  ■  k  ■  iQ.  Now  the  induction  will  be  complete  if  we 
prove  that  for  any  node  i,  the  fact 


J 


■ 
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t(<3>k)  <  t(<4>j)  (4.3) 

implies 

t(<3>k.)  <  t(<4>j.)  .  (4.4) 

Pi  Pi 

k  i 

We  distinguish  two  cases.  Suppose  first  that  p^  ■  p^. 
to  k  implies 

t(<3>k)  >  t(<3>k.) 

Pi 

and  (3.2)  applied  to  j  implies 

t(<4>J)  <  t(<4>*,)  .  (4.6) 

Pi 

lc  i 

These,  combined  with  (4.3)  and  the  fact  p.  ■  p-  imply  (4.4).  Suppose 

k  i  1  k  1  k 

next  that  p^  f  p^  .  Let  us  denote  by  SEND^(£)  and  RCVj(£)  the  event  of 

node  i  sending/ receiving  MSGk  to/from  neighbor  £  respectively.  Then 

<3>  and  Assumption  d)  in  Sec.  2  imply  that 

t(<3>k)  -  t (SENDk (p| ) )  (4.7) 

and  <4>  says  that 

t (<4>|)  -  t(SENo|(p|))  .  (4.8) 

Now  (4.3)  Assumption  c)  in  Sec.  2  And  (4.7),  (4.8)  imply 

t(RCVk  (i))  <  t(RCVj.(i))  . 

Pi  Pi 


(4.9) 
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But 


t(<3>k  )  <  t(RCVk  (i»  (4.10) 

Pi  Pi 

k  k 

since  <3>  is  performed  whenever  the  first  MSG  is  received,  and  similarly, 

t(<4>j.)  ItCRCV*  (i))  (4.11) 

Pi  Pi 

since  <4>^  is  performed  after  having  received  MSG^  from  all  neighbors. 

Now,  (4.9)  -  (4.11)  imply  (4.4),  and  this  completes  the  induction  and  the 
proof  of  the  Theorem. 

Communication  cost 

Observe  that  by  Theorem  CT2-1,  the  comunication  requirements  of  CT2  are 
the  same  as  those  of  CT1,  namely  |v|  log£  |v|  bits  per  link  in  each  direction. 
Observe  however  that  the  storage  and  processing  requirements,  as  well  as  the 
required  execution  time  are  larger  than  in  CT1. 

Protocols  CT3  -  CT5  use  a  different  idea  for  achieving  the  termination 
property.  CT3  is  quite  wasteful  in  terms  of  communication  requirements, 
but  it  is  convenient  in  order  to  illustrate  the  idea  and  to  be  used  as  a 
basis  for  developing  the  more  efficient  versions  CT4  and  CT5.  In  addition, 
it  can  be  used  for  different  purposes,  like  learning  the  network  topology. 

Protocol  CT3 

Suppose  we  use  protocol  CT1,  except  that  for  each  node  we  propagate  not  only 
the  identity  of  the  node,  but  also  of  its  neighbors.  In  other  words  MSG^ 
of  CT1  will  now  carry  the  identity  of  j  as  well  as  of  all  its  neighbors,  i.e. 
will  have  the  format  MSG-^  (K^),  where  K .  contains  the  identities  of  all 
neighbors  of  j.  The  termination  property  is  achieved  using  the  fact  that, 
if  a  node  k  receives  MSG^(K^),  it  will  eventually  receive  MSG*^)  as  well, 
for  any  i  t  and  the  termination  signal  will  occur  when  node  k  will  have 
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heard  from  all  these  nodes.  Clearly,  the  algorithm  at  each  node  will  have 
two  stages,  where  in  the  first  one  it  will  learn  the  identity  of  its  own 
neighbors  and  in  the  second  will  proceed  with  the  protocol  as  described 
before.  In  the  description  of  the  protocol,  we  shall  use  a  special  nota¬ 
tion  WAKE  for  messages  belonging  to  the  first  stage. 

Variables  of  the  algorithm  at  node  i 
Mi  same  meaning  as  in  CT2  ; 

dj  ■  0  before  entering  algorithm, 

1  while  looking  for  identity  of  neighbors, 

2  while  looking  for  all  connected  nodes; 

d|  ■  0  when  i  knows  nothing  about  j  (for  j  4  i) , 

1  while  i  knows  j  only  as  a  neighbor  of  another  node, 

2  while  i  knows  j  directly  (i.e.  MSG^ (K^)  has  been  received); 

N^(£) shows  if  WAKE  has  been  received  from  neighbor  l  (values  0,1); 

is  the  list  containing  the  identities  of  all  neighbors  of  i. 

Messages  received  and  sent  by  the  algorithm  at  node  i 

MSG1^.)  -  message  containing  identities  of  i  and  of  its  neighbors; 

WAKE1  -  message  asking  the  neighbors  to  wake  up  and  to  send  their 
identity; 

START  -  as  before. 

Similarly  MSG^(K^)  and  WAKE1  for  received  messages. 


MHSdtHiMMlMM 
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Algorithm  for  node  i 

Assumption:  just  before  node  i  enters  algorithm,  it  has  (C  ■  empty  and 
d|  *  0,  N^(t)  ■  0  for  all  j  and  l  e  G^, 

1.  For  START 

la.  dj  1;  KL  *-  WORK;  send  WAKE1  to  all  neighbors. 

2.  For  WAKE1 

2a.  N^(l)  *■  1;  include  l  in  K  ; 

2b.  —  ^i  *  then:  same  as  <la>  ; 

2c.  d1  max  (df,  1} ; 

3.  ^f  1'  c  G^,  holds  N^(t')  ■  1,  then 

3a.  d1  «-  2;  ;  t'  e  Gi ,  set  N.(i’)  -  0,  send  MSG1  (Kp  to  all 

neighbors. 

4.  For  MSGJ(K.)  and  M.  «  WORK 

)  i 

5.  if  d?  i  2,  then  d!  +  2;  k  e  K  set  d*  «■  max  (d!1,  1}, 

—  1  1  j*  x  x 

send  MSG^(Kp  to  all  neighbors. 

6.  if  j  holds  d^  =  2  or  0,  then  M.  NORMAL. 

—  J  l  x 

The  properties  of  the  protocol  are  given  in  the  following: 

Theorem  CT3 

Suppose  START  is  delivered  to  one  or  more  nodes.  Then 

a)  exactly  one  message  WAKE  traverses  each  link  in  each  direction; 

b)  <3>  happens  at  all  connected  nodes  in  finite  time  and  exactly  once; 

c)  when  MSG1(K^)  is  sent  by  node  i  (see  <3a>),  then  contains  exactly 
the  identities  of  all  neighbors  of  i; 

d)  for  each  node  j  in  the  connected  network,  exactly  one  message  MSG^(Kj) 
traverses  each  link  in  each  direction; 
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e)  every  node  i  will  perform  <6>  (i.e.  NT  NORMAL)  in  finite  time  and 

when  this  happens  it  will  have  d^  ■  2  for  all  connected  nodes  j  and  dj  ■  0 
for  all  disconnected  nodes  j  (i.e.  this  is  the  termination  signal). 


Proof 

The  propagation  of  WAKE  happens  as  in  protocol  PI  and  hence  a)  and  b). 

Now,  c)  is  clear  from  condition  <3>.  For  each  j,  propagation  of  MSG^(#C^) 
happens  as  in  protocol  PI  except  that  it  is  triggered  by  <3>  instead  of  by 
START  and  hence  d).  In  order  to  prove  e),  consider  the  situation  after  all 
messages  considered  in  d)  have  arrived.  Then  from  <S>,  a  node  i  will  have 
d|  ■  2  for  all  nodes  j  connected  to  it.  For  all  disconnected  nodes  j,  it 
will  have  d?.  «  0  and  hence  <6>  will  be  performed.  It  remains  to  prove  that 

1  i 

there  cannot  be  a  situation  where  <6>  holds  while  d:  «  0  for  some  connected 

node  j.  If  this  was  the  case,  there  must  exist  a  set  of  nodes  V  containing  i, 

i  k 

where  V  is  not  the  entire  network  and  d^  ■  2  for  )  c  V,  while  d.  ■  0  for 

k  1  1 

k  t  V.  But  then  <S>  shows  that  d.  >_  1  for  all  k  that  are  neighbors  of  any 

k  1  ~ 

node  in  V,  contradicting  ■  0  for  all  k  t  V.  This  completes  the  proof  of 
e)  and  of  the  theorem. 

Communication  cost 

On  each  link  in  each  direction  we  need  log2|v|  bits  for  the  WAKE  message  and 
| V |  (D  ♦  1)  log2|v|  bits  for  the  MSG  messages,  where  D  is  the  average  degree 
of  the  nodes  (average  number  of  neighbors).  Clearly  D  ■  2 | E | / | V |  and  hence 
the  communication  cost  is  (2 ) E j ♦ J v| ♦  1)  logjlv’l  bits  per  link  in  each  direction. 

As  mentioned  before,  protocol  CT3  employs  too  much  communication  and  its 
performance  can  be  considerably  improved.  One  way  is  to  use  the  position 
of  a  variable  in  a  vector  to  indicate  the  identity  of  a  node,  instead  of 
explicitly  mentioning  it.  This  idea  was  used  in  a  protocol  by  Finn  (3]  and 
we  present  here  an  improved  version  of  that  protocol: 

Protocol  CT4 

Variables  of  the  algorithm  are  d*  ,  d|  ,  N. (£),  M^  with  the  same  meaning  as 
in  CT3. 
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Messages  sent  and  received  by  the  algorithm  at  node  i 
START; 

1  2  I  Vl 

Di  "  *  di  .  di  messa8«  sent; 

D(t)  message  received  from  neighbor  t  ;  we  denote  its  contents  by 
{dl,  d2,...,  d^>. 


Algorithm  for  node  i 

Assumption:  just  before  node  i  enters  algorithm,  it  has  all  d^  »  0 
and  N.  (i)  ■  0  for  all  j  and  t  c  . 

1.  For  START 

la.  d?  *•  1;  *■  WORK;  send  Di  to  all  neighbors. 

2.  For  D(l) 

2a.  iif  D(£)  ■  {0,0, . . .  ,0, 1 ,0, . . .  ,0)  ,  then 


2b. 

2c. 

2d. 

3. 
3a. 

4. 

4a. 


Nj(i)  «■  1; 


if  d^  “0,  then:  same  as  <la>; 


y"k,  set  d^  •*-  max  {d^,  d^>; 

if  /t'  e  6^,  holds  N^(i’)  ■  1,  then: 

dj  *■  2;  e  6^  set  N^(l')  ♦  0;  send  to  all 


neighbors; 


if  D(4)  f  {0,0,... ,0,1,0,... ,0}  and  ■  1,  then: 
if  d^  j  2,  then  A  set  d*  ■*-  max  {d^,  d*} 
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5.  else,  if  3 3  such  that  d^  »  2  >  d?,  then: 

5a.  V*k  set  d^  ♦  max  {d^,  d^};  send  to  all  neighbors. 

6.  if  jfi  holds  d|  *  2  or  0,  then  Mj  «■  NORMAL 

Observe  that  the  message  {0,0, .. .0,1,0, .. .0}  replaces  HAKE  of  protocol  CT3. 
Note  also  that  Finn's  [3]  protocol  requires  a  node  to  send  messages  every 
time  its  table  is  updated,  while  here  messages  are  sent  only  when  relevant 
new  information  is  received  (see  <4a>,  <5>).  In  this  sense,  the  present 
version  is  more  efficient  than  [3].  The  properties  of  the  protocol  are 
summarized  in 

Theorem  CT4 

Suppose  START  is  delivered  to  some  node.  Then 

a)  exactly  one  message  {0,0, .. .0,1,0, ... ,0}  traverses  each  link  in  each 
direction  and  this  is  the  first  message  on  each  link; 

b)  <3>  happens  at  all  connected  nodes  exactly  once  and  then  d|  >_  1  for 
all  neighbors  j  of  i; 

c)  no  more  than  |v|  messages  with  format  +  {0,0, . . . ,1,0, . . .0}  traverse 
each  link  in  each  direction; 

d)  same  as  e)  in  Theorem  CT3. 

Proof 

Lines  <1>,  <2c>  and  the  fact  that  the  propagation  is  as  in  PI  imply  a)  and  b). 
From  the  algorithm  it  is  clear  that  d|  can  only  increase  and  from  <3a>  and 
<Sa>  follows  that  a  message  i  {0,0, . . . ,1 ,0, . . .0}  can  be  sent  by  i  only  when 
some  d|  is  set  from  0  or  1  to  2  and  this  can  happen  only  once  for  each  j. 

Hence  c).  Finally  d)  follows  in  the  same  way  as  e)  in  Theorem  CT3. 


\ 
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Communication  cost 

Each  message  contains  2 | V |  bits  and  hence  at  most  2|v|  (|v|  ♦  1)  bits 
will  travel  on  each  link  in  each  direction. 

Protocol  CT3  can  be  improved  in  another  way,  resulting  in  a  more  efficient 
protocol  CT5. 


Protocol  CTS 

Consider  protocol  CT3  with  the  following  variation 

Whenever  receiving  MSGJ(K. ),  a  node  i  consults  its  table  containing  {d^}. 

j  J  1 

If  d^  =  2,  the  MSG  is  discarded,  since  such  a  MSG  has  been  previously 

received  and  forwarded  to  all  neighbors;  this  part  is  the  same  as  in  CT3. 

If  d|  <  2,  then  dj  2  and  the  MSG  is  sent  to  all  neighbors,  but  now,  before 

sending  MSGJ(^),  the  following  pruning  operation  is  performed.: 


For  all  k  e  K . ,  if  d.  >_  1,  then  k  is  deleted  from  K.;  otherwise  k  is  not 

J  1  j.  1  ^ 

deleted  from  K and  the  variable  d^  receives  value  1.  Then  MSGJ(K^)  is 

sent  to  all  neighbors.  Node  k  can  indeed  by  deleted  when  d^  >_  1  because 

in  this  case  k  has  been  sent  before  by  i  to  neighbors,  either  as  a  neighbor 

k  k  k 

of  some  node,  in  which  case,  d.  *  1  or  in  MSG  ,  in  which  case  d.  ■  2.  One 

way  or  the  other,  there  is  no  need  to  send  k  again.  All  properties  of  CT3 

hold  here  as  well,  but  the  pruning  operation  assures  that  the  identity  of  each 

node  k  travels  no  more  than  twice  on  each  link  in  each  direction:  once  as  a 

k 

neighbor  of  some  node  and  once  in  MSG  .  Hence  the  communication  cost  is 
bounded  by  2 1 V | ] V |  bits  per  link  in  each  direction. 
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Minimum-hop-path  protocols 

The  problem  considered  next  is  to  obtain  the  paths  with  smallest  number 

of  links  (hops)  from  each  node  to  each  other  node.  As  before,  at  the 

beginning  of  the  algorithm  a  node  knows  only  its  own  identity  and  the 

adjacent  links.  When  the  algorithm  is  completed  at  a  node  i,  we  want 

k 

the  node  to  know  its  distance  d.  in  terms  of  number  of  links  to^all 

1  k 

other  nodes  to  which  it  is  connected  and  a  preferred  neighbor  p^  through 
which  it  has  the  minimum-hop  path  to  k.  Observe  that  we  do  not  require 
nodes  to  know  the  entire  minimum-hop  path. 

If  the  travel  time  of  a  control  message  were  identical  on  all  links,  then 
we  could  have  accomplished  the  minimum-hop-path  by  using  protocol  PI 
(see  Theorem  PI-1  c)).  However,  as  stated  before;  such  an  assumption 
is  not  practical,  and  the  problem  is  to  design  a  DNP  where  nodes  will 
receive  the  first  message  with  a  given  identity  from  the  neighbor  pro¬ 
viding  the  shortest  path,  even  if  link  delays  are  arbitrary.  Such  a 
protocol  has  been  proposed  by  Gal lager  [1]  and  here  we  give  its  formal 
description  and  validation. 

Protocol  MH 

A  node  enters  the  algorithm  in  the  same  way  as  in  the  CT  protocols,  namely 
when  receiving  START  or  the  first  control  message,  and  at  that  time  is  sends 
its  own  identity  to  all  neighbors.  After  having  received  the  identity  of  all 
neighbors,  node  i  knows  all  nodes  that  are  at  distance  1  from  it.  Node  i 
keeps  this  information,  sends  it  to  all  neighbors  and  then  waits  to  receive 
the  lists  of  all  nodes  that  are  at  distance  1  from  each  of  its  neighbors. 

The  union  of  these  lists  minus  the  set  of  nodes  already  known  to  i,  i.e.  those 
that  are  at  distance  0  or  1  from  it,  is  exactly  the  set  of  nodes  that  is  at 
distance  2  from  i.  This  information  is  kept  again  at  i  and  also  distributed 
to  neighbors,  and  the  procedure  is  repeated.  If  at  some  level,  the  union  of 
the  lists  received  from  all  neighbors  contains  no  nodes  that  are  unknown  to  i, 
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then  node  i  has  completed  the  algorithm.  It  sends  to  all  neighbors  a 
message  saying  that  it  has  no  new  node  identities  to  send  and  stops. 

Any  further  message  it  may  receive  is  disregarded. 

Variables  of  the  algorithm  at  node  i 

d^  -  distance  from  i  to  k;  set  initially  to  )v)  for  all  k  (values  0,1,... |v|); 

k 

p^  -  preferred  neighbor  from  i  to  k,  for  all  k; 

-  state  of  node  i  showing  distance  covered  by  the  protocol  up  to  now 
(values  -1,0,1,... ,|v|-l); 

shows  if  node  i  is  currently  participating  in  the  protocol  (values 
NORMAL,  WORK); 

N^(i)  -  level  of  last  message  received  on  link  (i,£) 

(values  -1,0,.. . , |v|-l) ,  for  £  c  6^. 

Messages  sent  and  received  at  node  i 

MSG(LIS'r)  -  message  sent  by  node  i 

MSG  (£,LIST)  *  MSG(LIST)  received  on  link  (i,£) 

START 


Algorithm  for  node  i 

k 

Assumpti  n:  just  before  node  i  enters  algorithm,  it  has  p^^  *  nil. 


} V |  for  all  k,  Z  ^  •  N^(m)  ■  -1  for  all  m  c  G^. 


I 
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1.  For  START  or  MSG(l,LIST) 


2. 

if  Zj  ■  -1,  then:  d*  «-  0;  Mj  WORK;  Zj  «-  0;  LIS^  ■  (i); 
send  MSG(LISTi)  to  all  m  e  G 

3. 

if  MSG  and  M^  »  1,  then 

4. 

N.U)  -  N.(i)  ♦  1; 

S. 

Yk  t  LIST,  then 

5a. 

if  d*  >  NjfO  ♦  1,  then  d5^  N±  (i) 

♦  1;  p*  *■  l. 

6. 

—  Zi  —  Ni^m^»  fm  e  (?.,  then 

6a. 

Zi->-  Zi  *  1;  LIST  =  (k|d*  =  Zi); 
to  all  neighbors; 

send  MSG(LISTi) 

7. 

if  LISTj  ■  ♦,  then  M^  ♦  NORMAL. 

Preliminary  properties  of  the  protocol  are  given  in  Lemma  MH-1,  while  the 

main  properties  appear  in  Lemma  MH-2  and  in  Theorem  MH-1. 

Lemma  MH-1 

Suppose  START  is  delivered  to  a  node  (or  several  nodes).  Then  for  any 

connected  node  i  holds: 

a)  i  will  enter  the  protocol  in  finite  time; 

b)  messages  are  sent  by  node  i  if  and  only  if  Z.  is  incremented  at  the 

same  time;  if  MSG  is  sent  by  i  while  Zj  ■  Z,  receipt  of  the  MSG  at 
neighbor  t  will  cause  N^Ci)  •*-  Z; 

c)  Zj  and  Nj(m)  for  each  m  e  change  only  by  increments  of  el; 

d)  for  each  i  t  G^,  holds  N^(m)  ■  Z,  or  Z.  i  1  and  there  is  at  least  one 

m  for  which  N^m)  «  Zi  -  1  (note:  this  implies  Zi  •  min^On)  ♦  1); 
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e)  no  message  can  arrive  on  links  (i,m)  for  which  N^(m)  ■  ♦  1; 

f)  if  Z^  is  incremented  at  time  t,  then  for  all  m  c  holds  .  . 

N.(m)  (t>)  «  Z.  (t*)  or  Z.(t*)  -  1.’ 

Proof 

a)  holds  since  propagation  of  <2>  happens  as  in  PI.  Assertion  b)  holds 
since  Z^  is  incremented  whenever  MSG  is  sent  (<2>,<6>),  N^(t)  is  incremented 
whenever  MSG  is  received  from  t(<4>)  and  both  are  initialized  to  -1.  In 
addition,  c)  follows  from  <2>,  <4>  and  <6a>.  Property  d)  is  true  immediately 
after  the  time  node  i  enters  the  algorithm,  at  which  time  either  Z^  ■  0 

and  min  N.  (m)  ■  -1,  or  Z.  »  1  and  min  N.  (m)  ■  0,  the  latter  if  i  has  only 

one  neighbor  and  enters  the  algorithm  by  receiving  MSG  from  it.  Suppose  now 
that  the  property  is  true  at  node  i  up  to  time  t-  and  we  want  to  show  that  it 
will  hold  at  time  t*  as  well.  The  variables  N.(*)  or  Z.  can  change  at  time  t 
only  if  a  MSG  is  received,  from  neighbor  l  say.  Let  Z^Ct-O  ■  Z.  We  have 
several  cases: 

i)  fL(i)(t-)  *  Z  -  1  and  d  m  i  l  with  N^(m)(t-)  »  Z  -  1;  then  H.  (£)  (t*)» 

*  Z^(t+)  »  z  and  all  other  N^(»)  do  not  change,  hence  d)  continues  to 

hold  at  time  t+j 

ii)  Ni(t)(t-)  *  Z  -  1  and  ^  1  with  Ni(m)(t-)  ■  Z  -  1;  then  1^(1)  (t*)  «  Z 

and  Z^(t+)  «=  Z  1 ,  since  <6>  holds  at  t,  and  d)  continued  to  hold  at  t*; 

iii)  N^(t)(t-)  *  Z,  in  which  case  fT(£)(t»)  *  Z  ♦  1  and  Z.{t+)  »  Z,  hence  d) 
continues  to  hold  at  time  t+; 

iv)  we  claim  that  FL(l)(t-)  cannot  be  Z  ♦  1.  Suppose  N^(£)(t-)  ■  Z  ♦  1. 

Then  N^(t)(t+)  »  Z  ♦  2,  and  from  b)  follows  that  at  time  tl  <  t,  node  1 

has  sent  MSGfLIST^)  while  Z^  ■  Z  ♦  2.  From  <6>,  <6a>  we  have  Z^(tl-)  ■  Z  ♦  1 
and  Nt(i)(tl«0  >_  Z  ♦  1.  This  means  that  3t2  <  tl  when  i  has  sent  MST(LIST) 
to  1,  while  Z^(t2*0  ■  Z  ♦  1.  But  the  latter  and  (t-)  •  Z  contradicts  the 
monotonicity  of  Z ^  (see  c)). 


J 
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This  completes  the  proof  of  d).  Observe  now  that  e)  is  exactly  case  iv) 
in  d).  Finally,  observe  that  scanning  cases  i)  -  iv)  of  d),  we  see  that 
is  incremented  only  in  case  ii)  and  f)  clearly  holds  in  this  case,  com¬ 
pleting  the  proof  of  the  Lemma. 

Definition 

The  number  of  links  on  the  minimum-hop  path  from  i  to  k  is  called  the  hop- 
distance  from  i  to  k. 

Lemma  MH-2 

Under  the  same  conditions  as  in  Lemma  MH-1,  holds: 

a)  if  a  node  i  has  nodes  at  hop-distance  r,  then  it  sets  ♦  r  in  finite 

time  and  then  sends  MSG(LIST.),  where  LIST,  contains  exactly  all  nodes 

11  v 

k  that  are  at  hop-distance  r;  for  all  those  nodes  holds  d^  »  r  and 
this  d*  is  final. 

b)  let  be  the  largest  hop-distance  from  node  i  in  the  network,  i.e. 
node  i  does  have  nodes  at  hop-distance  ,  but  not  at  hop-distance 

(Sj  ♦  l);  then  node  i  will  set  Z^  ♦  CS^  ♦  1)  in  finite  time,  at  which 
time  it  sends  MSG(LIST.)  with  LIST.  »  4  and  performs  <7>;  node  i  will 
not  increase  Z ^  any  further. 

Proof 

a)  Setting  of  ♦  0  while  sending  MSGCLISTj)  with  LISTj  •  (i)  propagates 

as  in  PI  and  hence  will  happen  at  all  nodes  in  finite  time.  How  suppose 
a)  holds  for  all  nodes  that  have  nodes  at  hop-distance  (r-1).  Consider 
a  node  i  that  has  nodes  at  hop-distance  r.  Then  itself  and  all  its 
neighbors  m  have  nodes  at  hop-distance  (r-1)  and  by  the  induction  hypothesis 
they  set  Z  ♦  (r-1)  and  send  MSG(LIST  ).  When  such  a  message  arrives  at 

fll  fn 

i,  it  sets  N^m)  (r-1)  and  after  all  such  messages  arrive,  <6>  will  hold 

with  Z^  ■  (r-1).  This  causes  Z ^  ■*-  r.  At  this  time  we  have  from  Lemma  MH- 

N^m)  ■  r  or  (r-1)  for  all  m. 
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Now  suppose  k  is  at  hop-distance  r  fro*  i.  Then  there  is  a  neighbor 

m  of  i  such  that  k  is  at  hop-distance  (r-1)  from  ■  and  there  is  no 

neighbor  m  of  i  such  that  k  is  at  hop-distance  strictly  less  than  (r-1) 

from  m.  By  the  induction  hypothesis,  k  was  sent  by  ■  in  MSGfLIST^) 

while  Z  •«-  (r-1)  and  hence  was  received  at  i  while  N.  (m)  *•  (r-1),  but 
in  * 

was  sent  by  n£  neighbor  ■'  while  Z^,  *  Z  «  (r-1).  Hence  at  the  time 

Z.  *■  r  we  have  d^  ■  r,  and  therefore  k  is  .-.ent  in  MSG  (LI  ST, ).  From 

1  1  k  1 

<Sa>  it  is  clear  that  this  d.^  is  final.  A  similar  argument  shows  that 

nodes  at  hop-distance  >r  or  <r  from  i  cannot  be  included  in  the  LIST^ 

considered  above. 

b)  First  consider  a  node  i  s.t.  *  min  (S^)  where  the  min  is  over  all 

nodes  in  the  network.  All  its  neighbors  m  have  nodes  at  distance 

and  by  a)  they  send  MSG(LIST  )  while  Z  -  S. .  When  all  these  messages 

id  m  x 

arrive  to  i,  Z^  will  become  ♦  1,  but  since  i  has  no  nodes  at  hop- 
distance  Si  ♦  1,  holds  LlSTi  =■  4  and  hence  i  performs  <7>.  Now  suppose 
by  induction  that  b)  holds  for  all  nodes  i  for  which  <_ S  -  1.  Con¬ 
sider  a  node  j  with  Sj  »  S.  Node  j  has  a  node  k  at  hop-distance  S  and 
k  is  included  in  LIST^  when  j  sends  MSG(LIST^)  while  Z^  S.  For  an 
arbitrary  neighbor  m  of  j ,  node  k  is  at  hop-distance  (S-l),  S  or  (5*1) 

from  m  and  hence  S  >  S-l.  If  >  S,  then  a)  implies  that  Z  will 
m  —  m  —  m 

become  S  in  finite  time.  If  S  *  S-l,  then  Z  will  become  S  in  finite 

m  m 

time  from  the  induction  hypothesis.  Hence  from  Lemma  MH-1  b) ,  N^ (m) 
will  become  S  in  finite  time  for  all  neighbors  m  of  j  and  hence  Z^  will 
become  (S+l).  Since  j  has  no  nodes  at  hop-distance  (S+l),  <7>  will  hold 
and  this  completes  the  proof. 

From  the  previous  Lemmas,  we  obtain  the  following: 

Theorem  MH-1 

If  START  is  delivered  to  a  node  (or  to  several  nodes),  then  all  connected 

nodes  will  enter  the  protocol  in  finite  time.  All  nodes  i  will  complete 

k  k 

the  protocol  in  finite  time  with  correct  d.  and  p.  for  all  connected  nodes  k 

u  _  k  11 

and  with  d*  «  |v|  ,  p?  -  nil  for  all  disconnected  nodes. 


Proof 


v 

The  only  unproven  part  is  the  setting  of  ,  which  however  follows 
immediately  from  the  proof  of  Lemma  MH-2  a). 

Communication  cost 

Since  the  identity  of  any  node  travels  exactly  once  on  each  link,  we 
need  |v|  logj  } V-)  bits  on  each  link  in  each  direction. 


Path-updating  protocols 

In  the  protocol  of  [2],  [4]  each  node  maintains  a  path  to  each  other 
node  in  the  network  and  updating  "cycles"  allow  these  paths  to  be 
changed  so  that  they  are  improved  in  each  cycle  and,  in  addition,  the 
collection  of  paths  to  any  given  node  form  at  any  given  time  a  loop- 
free  pattern  (i.e.  a  tree).  Here  we  present  first  the  fixed-topology 
part  of  the  path-updating  protocol  and  then  show  that  protocol  CT2  can 
be  used  to  initialize  it.  The  validation  of  both  is  based  on  the  PIF 
basic  protocol. 


Protocol  PU 

The  protocol  updates  paths  from  all  nodes  in  the  network  to  a  given 

node  s  and  can  be  repeated  independently  to  update  paths  to  each  of 

the  other  nodes.  Therefore,  we  can  present  only  the  protocol  for  a 

given  "destination"  node  s.  The  protocol  works  very  similar  to  the 

PIF  protocol.  Here  however  it  is  assumed  that  just  before  START  is 

delivered  to  s,  all  connected  nodes  i  already  have  preferred  neighbors 

p?  such  that  the  collection  of  the  links  (i,  p?)  form  a  directed  tree 

rooted  at  s.  We  also  assume  that  at  that  time,  all  m*  »  0  and  in 

addition  the  variables  df  as  defined  below  are  such  that  d?  >  ds,  ,  or 
s  1  1  Pi 

in  words  d^  is  strictly  decreasing  while  moving  downtree. 

Finally,  it  is  assumed  that  N?(l)  ■  0  for  all  t  t  G. .  The  validation 

of  the  protocol  (theorem  PU-1)  will  show  that  these  properties  continue 

to  hold  when  the  protocol  is  completed,  so  that  a  new  update  cycle  can 

then  be  started. 


Protocol  PU 

Variables  of  the  algorithm  at  node  i  j  s 

5 

N^(t)  ,  same  as  in  protocol  PIF  ; 

d^  -  distance  from  node  i  to  neighbor  i  as  measured  at  the  time  it  is 

needed  by  the  algorithm;  can  be  time-varying  (values:  any  strictly 
positive  real  number) ,  l  t  G^; 
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df  -  estimated  distance  from  i  to  s  on  the  preferred  path; 
pf  -  "preferred"  neighbor  of  i  for  s; 

Df(t)  -  storage  for  d*  *  d^,  for  t  e  G^; 

mf  ■  1  after  performing  <3>  and  before  performing  <4>  ;  ■  0  otherwise. 
Messages  received  and  sent  by  the  algorithm  at  i 
MSG*(df)  -  message  sent; 

MSGs(t,ds)  -  message  received. 

Algorithm  for  node  i  +  s 

1.  For  MSGs(l,ds) 

2.  N*(t)  1;  D*(l)  dS  ♦  du  . 

3.  if  £  =  pf,  then:  df  ♦  rain  Df(t')  over  l'  s.t.  N*(l')  ■  1; 

raf  1;  send  MSGs(df)  to  all  neighbors,  except  p*. 

4.  if  ‘fl'  t  Gi  holds  Nf(t')  *  1,  then:  send  MSGs(dJ)  to  pf  ; 

p*  •*-  k*  that  achieves  min  Df(i')  over  l1  e  G. ;  mf  «-  0; 
ft*  e  G.,  set  N* (£')  «-  0. 

The  algorithm  for  s  is  the  same  as  in  PIF  except  that  all  messages  sent  by 
s  have  format  MSGs (0) . 


Theorem  PU-1 

Suppose  the  assumptions  given  just  before  the  presentation  of  the  protocol 
hold.  Then: 
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a)  theorem  PIF-1,  where  p?  refers  (only  in  this  part)  to  the  initial 
preferred  neighbors. 

b)  the  collection  of  links  {(i,  p*)}  forms  at  all  times  a  tree  rooted 
at  s  with  the  following  properties: 

, . .  s  s 
(1)  m.  <  m 

1_  *1 

(ii)  if  nr  •  ms  ■  0,  then  d?  >  ds 
i  s  i  _s 

Pi  Pi 

c)  for  each  link  (i,l)  the  "distance"  d^  is  measured  exactly  once 
by  node  i;  at  the  end  of  the  protocol,  all  nodes  will  have  paths 
to  s  that  are  no  longer  than  before  the  protocol  starts,  where  the 
length  of  a  path  is  the  sum  of  the  weights  of  the  links;  if 
initially  the  tree  defined  by  (p?)  is  not  identical  to  the  minimum- 
weight-tree  in  terms  of  the  measured  {di£},  then  there  is  a  nonempty 
set  of  nodes  that  will  strictly  improve  their  paths. 


Proof 

Observe  that  the  present  protocol  is  identical  to  PIF,  except  that  <3>  is 
performed  by  a  node  i  only  when  MSG  is  received  from  p*  (and  not  as  soon 
as  the  first  MSG  is  received  as  in  PIF),  we  introduce  the  quantities  d?, 

D* (1) ,  diJt  and  the  preferred  neighbor  p^  is  changed  in  <4>.  Now,  if  we 
denote  by  Pls  the  initial  tree,  <3>  and  <4>  propagate  here  exactly  as  in 
PIF,  provided  that  in  that  protocol  a  MSG  traverses  any  link  in  PI5  much 
faster  than  any  other  link.  Since  Theorem  PIF-1  holds  for  arbitrary  link 
travel  time-,  assertion  a)  follows.  In  order  to  prove  b) ,  suppose  the 
assertions  hold  up  to  time  t-  and  we  want  to  show  that  if  <3>  or  <4>  happens 
at  time  t  at  some  node  i,  the  assertion  continues  to  hold. 

Observe  that  if  <3>  happens  at  node  i  at  time  t,  then  p?  is  not  changed  and 
hence  the  tree  property  continues  to  hold.  Also,  b)  ii)  is  not  affected 
by  <3>  and  hence  we  only  have  to  check  that  b)  i)  continues  to  hold. 


Since  m5(t-)  •  0,  we  have  by  the  induction  hypothesis  m5(t)  ■  0  for  any  j 
for  which  p5(t)  ■  i  and  hence  b)  i)  continues  to  hold  for  j  and  i  after 
time  t.  On  the  other  hand,  when  performing  <3>,  node  i  receives  MSG5 
from  p5,  so  that  p?  must  have  performed  <3>  before  t,  implying  that  mss(t)  *  1 
and,  since  m5(t+)  *  1,  assertion  b)  i)  continues  to  hold  after  t  ^i 
for  i  and  p5  as  wo' 1 . 

Now  suppose  <4>  happens  at  some  node  i  at  time  t.  Observe  that  at  that 
time,  i  has  already  received  MSG  from  all  neighbors  and  it  performs  mf  0. 

Consider  first  any  node  j  such  that  p5(t)  =  i.  If  p5(tQ)  *  i,  where  tQ 
is  the  time  the  protocol  started,  then  receipt  of  MSG  at  i  from  j  means 
that  j  has  performed  <4>  before  time  t.  If  p^ (tQ)  f  i,  then  j  has  changed 
p^  before  time  t  and  again  this  shows  that  it  has  performed  <4>  before  time  t. 
Consequently  m5(t)  ■  0  and  hence  b)  i)  continues  to  hold  after  time  t  for  j 
and  i.  Also,  from  the  way  d5  is  calculated  and  p?  is  chosen  follows  that 

d*  >  D5(i)  «  d5  ♦  dj.  >  d*  ,  (6.1) 

where  the  last  inequality  follows  from  the  assumption  d^  >  0  (see  definition 
of  d.^).  Consequently  b)  ii)  continues  to  hold  after  time  t.  Now,  con¬ 
sider  the  pair  i  and  k*  ■  p5(t*)*  Assertion  b)  i)  holds  trivially  after  t 
for  i  and  k*  since  m5  0,  while  assertion  b)  ii)  holds  by  the  same  argument 
as  in  (6.1).  Now  (i,k*)  cannot  close  a  loop  since  by  b)  i) ,  all  nodes  t  in 
such  a  loop  must  have  m5  •  0,  and  going  around  the  loop  this  would  imply  by 
b)  ii)  that  d5  >  d5  .  The  proof  of  c)  is  quite  simple  and  will  be  deleted 
here.  The  reader  is  referred  to  similar  proofs  that  appear  in  [4,  Sec. 4}  and 
[6,  Appendix,  Lemma  1]. 

Communication  cost 

Clearly  there  is  exactly  one  message  on  each  link  in  each  direction. 

Its  size  in  bits  depends  on  the  number  of  bits  assigned  to  d*. 
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Protocol  PUI  (path-updating  initialization) 

In  order  to  allow  proper  evolution  of  the  PU  protocol,  it  is  necessary 
to  initialize  it  in  the  sense  of  building  the  initial  trees  {(i,  p?)} 
for  all  "destinations"  j  in  the  network.  This  can  be  done  by  using 
protocol  CT2  and  we  shall  give  here  the  additions  that  allow  initiali¬ 
zation  of  protocol  PU. 

Variables  used  by  the  algorithm  at  node  i 

Sane  as  in  CT2  except  that  d?  has  the  neaning  as  in  PU  and  in  addition 
m?,  diJt,  d?,  Yj  and  1  e  G^  with  the  same  meaning  as  in  PU. 

Messages  sent  and  received  by  the  algorithm  at  node  i 
Same  as  in  PU  and  in  addition,  START. 

Algorithm  for  node  i 

Assumption:  just  before  node  i  enters  the  algorithm,  it  has 

N? (1)  *  m?  ■  0  for  all  j  and  l  t  G. . 
i  i  i 

1.  For  START  or  MSGj(i,dj),  j  +  i. 

la.  rf  MSG,  then:  n|(1)  ♦  1;  D^U)  «-  dj  ♦  d.t  . 

2.  if  »  0,  then:  M^  •*•  WORK;  m1  1;  send  MSG1  (0)  to  all 

neighbors. 

3.  if  MSG  and  raj  ■  0,  then:  pj  1;  dj  «-  send  MSG^  (dj) 

to  all  neighbors  except  pj. 

i£  iV  e  G.,  holds  Nj(i')  -  1,  then:  send  MSG^  (d j)  to  p|; 
pj  *■  k*  that  achieves  min  Dj|  over  k  c  G^;  mj  ♦  0; 

Y l'  c  G.,  set  N j (i' )  «-  0. 


4. 
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5.  For  MSG1 (t,d1) 


N^(i)  *  l; 


if  i*  e6i,  holds  N^Cl')  •  1,  then:  M.  NORMAL;  mj  «-  0; 


l'  e  Gi>  set  Nj(i')  0. 


Theorem  PUI-1 


Suppose  START  is  delivered  to  any  node.  Then  any  given  node  j  will 

perform  <6>  in  finite  time  and  at  that  time  the  links  {(i,  pj)) 

i  1  1 

will  form  a  directed  tree  rooted  at  j,  with  the  property  d£  >  d”^ 
for  all  i.  In  addition,  at  that  time,  all  m?  ■  0,  and  all  ^i 

n|(A)  -  0. 

Proof 

The  protocol  here  evolves  as  CT2  and  hence  all  properties  of  CT2  hold 
here.  Also,  for  a  given  j,  action  <3>J  evolves  as  in  PI,  so  that 
Theorem  PI-1  c)  holds.  Consequently,  {(i,  p?)}  as  considered  after 

4  1 

all  nodes  perform  <Z>J  form  a  tree  rooted  at  j.  Also,  by  <la>  and 
<3>  and  the  fact  d^  >  0,  the  quantities  d?  are  strictly  decreasing 
going  downtree.  After  <3>J  is  performed  at  all  nodes,  the  protocol  for 
j  behaves  as  in  PU,  so  that  all  properties  continue  to  hold  until  j  per¬ 
forms  <6>  . 
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Topological  changes 

The  protocols  presented  so  far  assume  fixed  topology  of  the  network. 

As  such,  the  CT  and  Mil  protocols  may  be  performed  only  once  and  similarly, 
the  path-updating  protocol  may  be  initialized  only  once  (the  PU  protocol 
itself  should  be  repeated  periodically  to  account  for  load  variations). 

In  this  section,  we  present  extensions  to  the  above  protocols  that  take 
into  consideration  failures  and  additions  of  links  and  nodes,  the  main 
idea  being  that  whenever  a  topological  change  is  sensed  at  some  node,  a 
new  "cycle"  of  the  protocol  is  triggered  to  inform  the  network  of  the  new 
situation.  Since  we  are  working  with  a  distributed  network,  we  can  make 
no  a  priori  assumption  regarding  the  number,  sequence  or  timing  of  topolo¬ 
gical  changes,  and  as  such  the  extended  protocols  must  work  for  all  circum¬ 
stances. 

With  topological  changes  occurring  in  the  network,  the  assumptions  of 
Sec.  2  should  be  changed  accordingly.  In  particular,  assumptions  a)  and 
c)  of  Sec.  2  will  be  changed  now  to: 

a')  link  (i,j)  fails/recovers  at  the  same  time  that  link  (j,i)  fails/ 
recovers,  so  that  (i,j)  belongs  to  the  network  iff  (j,i)  belongs 
to  the  network; 

c')i)  each  message  sent  by  node  i  on  link  (i,j)  arrives  correctly  in 
finite  nonzero  undertermined  time  or  the  link  fails  in  finite 
time; 

ii)  whenever  a  link  fails  or  recovers,  both  ends  are  notified  in  finite 
tiuie,  but  not  necessarily  at  the  same  time; 

iii)  failure  or  recovery  of  a  node  is  considered  as  failure/recovery  of 
all  adjacent  links; 

To  make  ii)  above  more  precise,  let  F^(Jl)  denote  a  flag  indicating  the 
status  of  link  (i,i)  as  seen  from  node  i,  taking  values  DOWN  or  UP  if 
i  considers  link  (i,t)  as  down  or  up  respectively.  Then  we  assume: 


ii  ‘i  JKiiVu 
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if  F^(£)  ■  F^Ci)  and  DOWN,  then  F^(i)  becomes  DOWN  in 

finite  time  and  before  F^(t)  *■  UP; 

if  F^(t)  =*  F ^ (i)  »  DOWN  and  F^(l)  UP,  then  in  finite  time  holds 
either  F^i)  UP  or  F<  (£)  -  F^(i)  *  DOWN. 

Now  the  idea  for  extending  DNP’s  described  in  the  previous  sections 
to  account  for  topological  changes  is  the  following:  the  cycles  of 
the  protocol  will  be  labelled  with  increasing  numbers,  every  node 
remembers  the  highest  cycle  number  known  to  it  so  far  and  each  of  the 
cycles  corresponds  now  to  the  original  (nonextended)  protocol.  When 
a  node  wants  to  trigger  a  new  cycle  as  a  result  of  detecting  a  topolo¬ 
gical  change  in  an  adjacent  link,  it  resets  its  variables,  increments  the 
cycle  number  and  acts  as  if  it  has  received  START  for  a  new  cycle  with 
this  number.  Here  "reseting  variables"  means  to  adjust  the  appropriate 
variables  to  their  required  initial  value  as  stated  in  the  corresponding 
assumption  in  each  of  the  algorithms  (e.g.  in  MH,  p^  nil,  d*  |v|  for 
all  k  and  -1,  N ..  (m)  -1  for  all  adjacent  m).  The  number  of  the 

new  cycle  will  be  carried  by  all  messages  belonging  to  this  cycle  and  now, 
any  node  receiving  a  message  with  cycle  numbers  lower  than  the  one  known 
to  it  so  far  discards  this  message.  A  node  receiving  a  message  with 
higher  cycle  number  than  the  highest  known  to  it,  resets  its  own  variables, 
increases  its  remembered  cycle  number  accordingly  and  acts  as  if  it  now 
enters  the  algorithm  (i.e.  the  corresponding  cycle  of  the  extended  protocol). 
In  this  way  the  cycle  with  higher  number  will  "cover"  the  lower  number  ones, 
in  the  sense  that  when  a  higher  cycle  reaches  any  node,  the  node  will  forget 
the  previous  knowledge  and  will  participate  only  in  the  most  "recent"  cycle. 
Observe  that  several  nodes  may  start  the  same  new  cycle  independently,  but 
the  protocol  allows  this  situation  to  happen,  considering  it  in  the  same 
way  as  if  several  nodes  receive  START  in  the  nonextended  protocol. 

There  is  a  question,  whether  it  is  indeed  necessary  for  all  nodes  to  forget 
their  entire  previous  knowledge,  or  rather  it  is  possible  to  design  proto¬ 
cols  where  only  the  information  affected  by  the  topological  change  is  dis- 


carded.  For  the  PU  protocol,  such  a  protocol  appears  in  [2],  [4],  [7] 
but  for  the  others  this  is  still  an  open  question. 


As  an  example,  we  shall  write  exactly  the  extended  MH  protocol. 

Protocol  EMH  (extended  MH)  -  Version  A) 

Variables  used  by  the  algorithm  at  node  i 
Sane  as  in  MH,  and  in  addition: 

"  highest  cycle  number  known  to  i  (values:  0,1,...) 

F^(A)  -  status  of  link  (i,A)  as  known  by  i  (DOWN,  UP) 

Messages  sent  and  received  at  node  i 
MSG(Rif  LIS'T)  -  sent. 

MSG(A,R,  LIST)  -  MSG (R ,  LIST)  received  on  link  (i,A). 

Algorithm  for  node  i 

k  I; 

Definition:  "reset  variables"  means  p^  nil,  d^  |v|  for  all  k,  -1, 

N^ (1)  *■  -1  for  all  A  for  which  F^(A)  ■  UP. 

1.  Node  i  becomes  operational  (Note :  Node  i  becoming  operational  forces 

all  operating  links  (i , Jt ’ )  with  operating,  to  become  operational 
for  A') 

la.  F^(A')  ■+■  UP  for  all  operating  adjacent  links  (i,A')  with  A* 

operating; 

lb.  F^(A')  ♦  DOWN  for  all  nonoperating  adjacent  links  (i,l')  or 

those  links  (i.A')  with  A'  nonoperating; 

lc.  reset  variables;  R^  *■  0;  ♦  NORMAL. 


2.  Adjacent  link  (i,£)  becomes  operational  or  fails 

2a.  R.  R.  ♦  1; 

x  1  ’ 

2b.  Fi (£.)  •*-  DOWN  or  UP  according  to  new  status; 

2c.  reset  variables;  ♦  WORK;  *■  0;  *•  0;  LIST^  ■  {i}; 

send  MSG(R^ ,LIST\ )  to  all  m  for  which  F^(m)  *  UP. 

3.  For  MSG(£,R,LIST) 

3a.  if/  R  >  R.,  then 

3b.  if  R  >  R. ,  then:  R.  «•  R;  same  as  <2c>; 

3c.  if  M.  *  WORK,  then: 

—  i 

4. -7.  same  as  <4>-<7>  in  MH,  except  that  MSG  has 

format  MSG(Rj.,LISTi) . 

Note  that  <2>  and  <3b>  here  correspond  to  <2>  in  MH,  while  <3c> 
corresponds  to  <3>  in  MH.  Clearly,  similar  extended  protocols  can 
be  given  for  the  CT  and  also  for  the  PUI  protocols.  Their  properties 
are  similar  to  the  ones  of  EMH-Version  A  as  summarized  in: 

I 

Theorem  EMH-A-1 

i 

Consider  an  arbitrary  finite  sequence  of  topological  changes  with 

arbitrary  timing  and  location.  Within;  finite  time  after  the  sequence 

is  completed,  all  nodes  i  in  the  final  'connected  network  will  have 

k  k 

M.  ■  NORMAL  with  the  same  cycle  number  R.,  with  correct  d,  and  p.  for 
all  connected  nodes  k  and  with  d^  ■  |v|,  p^  «  nil  for  all  disconnected 
nodes  k. 

Proof 

PTom  <2a>  and  <1>,  each  topological  change  increments  the  cycle  counter 
R^  at  nodes  i  adjacent  to  the  change.  Every  change  in  a  link  status 
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affect*  two  nodes,  every  change  in  a  node  status  affects  a  finite 

number  of  node*.  Let  { i n }  be  the  collection  of  nodes  that  register 

change  of  status  of  an  adjacent  link,  including  those  due  to  status 

changes  of  the  node  at  the  other  end  of  a  link,  and  let  (tn>  be  the 

corresponding  collection  of  tines  when  the  status  change  is  registered. 

Since  there  is  a  finite  nuaber  of  topological  changes,  the  collections 

{i  },  {t  }  are  finite.  Let  R  ■  max  {R.  (t  ♦)}  over  all  n.  Then  R  is 
n  n  in 

the  highest  cycle  number  ever  known  in  the  network  and  the  cycle  with 
nuaber  R  is  started  by  (one  or  more)  nodes  ie{in)  that  increment  their 
R^  to  R  as  a  result  of  sensing  a  topological  change.  These  nodes  can 
be  considered  as  if  they  receive  START  in  the  MH  protocol  and,  indeed, 
the  network  covered  by  the  cycle  with  number  R  registers  no  more  topolo¬ 
gical  changes,  since  no  counter  number  R^  is  ever  increased  to  (R+T). 
Consequently,  the  evolution  of  this  cycle  is  the  same  as  in  protocol  MH 
and  therefore  Lemmas  MH-1,  MH-2  and  Theorem  MH-1  hold  here,  completing 
the  proof. 

In  version  A  of  MHE  as  presented  above,  as  well  as  all  other  similar 
extended  protocols,  there  is  the  problem  that  the  cycle  counter  numbers 
(R.)  increases  without  bound  and  hence  the  question  of  how  many  bits  are 
enough  to  represent  this  number.  In  [1],  [3]  the  authors  propose  a 
modified  version  of  the  extended  protocols  that  insure  bounded  counter 
numbers.  We  present  this  protocol  here  formally  (version  C)  and  give 
a  new  proof  for  its  validation.  The  procedure  is  illustrated  as  before 

on  Protocol  MH,  but  similarly  can  be  implemented  on  CT  and  PUI.  In 

order  to  provide  a  framework  for  understanding  Version  C,  it  is  convenient 
to  first  present  and  validate  a  non-distributed  Version  B  that  will  intro¬ 
duce  the  important  features  of  the  procedure  and  then  to  explain  the 
equivalence  between  Versions  B  and  C. 

Protocol  MHE  -  Version  B 

Messages  and  variables:  same  as  in  version  A. 


Same  as  in  Version  A,  except  for  the  following  changes: 


Lines  <2>  and  <2a>  become: 

2.  Adjacent  link  (i,t)  fai Is 

2a.  Ri  Ri  *  *  an<*  Proceed  to  <2b>. 

2'.  Adjacent  link  (i,£)  becomes  operational 
2a' .  wait  until  ■  NORMAL  and  then 

2a".  if  R^  <  R^,  then:  Rr  +■  Rr  ♦  (R^  -  R^)  for  all  nodes  r 

that  are  connected  to  i  and  furthermore, in  all  messages 
that  have  been  sent  by  such  node  r  and  not  received  yet, 
increase  R  by  (R.  -  R.) 

X 

2a'".  —  Ri  <  Ri*  sellar  to  <2a">  for  nodes  connected  to  1  where 

the  increment  is  (R^  -  R^) . 

2a' v.  Ri  Ri  *  *  and  Proceed  to  <2b>. 

After  <lc>  insert 

Id.  for  all  V  for  which  F^(t')  ■  UP,  proceed  as  in  <2a'>. 

The  main  property  of  Version  B  is  given  in  Theorem  MHE-B-1,  but  first 
we  need  several  definitions: 

Definitions 

A  link  (i,i)  is  said  to  be  operating  if  Fj(L)  ■  F^(i)  »  UP.  Two  nodes 
are  said  to  be  connected  if  there  is  a  set  of  operating  nodes  and  links 
connecting  them.  A  set  of  nodes  is  said  to  be  connected  if  every  pair 
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of  nodes  in  the  set  is  connected.  A  set  of  nodes  S  is  said  to  be  at 
level  R  if  min  R^  ■  R  for  i  e  S.  A  set  of  nodes  S  (connected  or  not) 
is  said  to  be  synchronized  if  either  a)  or  b)  below  holds: 

a)  all  nodes  i  e  S  have  ■  WORK. 

b)  there  is  at  least  one  node  i  c  S  with  M^  *  NORMAL  and  then  holds: 

i)  fj  c  S  with  -  NORMAL  holds  R^  ■  R.^  and 

ii)  Yj  t  S  with  Mj  -  WORK  holds  R^  >_  R^ 

Theorem  MHE-B-1 

In  MHE  -  Version  B,  if  at  any  time  t  a  set  of  nodes  S  is  connected,  then 
it  is  also  synchronized.  Furthermore,  if  the  set  is  at  level  R  at  time  t 
and  if  any  node  j  will  be  connected  at  any  future  time  t'>  t  to  any  node 
i  c  S,  then  it  will  have  Rj(t')  >  R.  (Note :  the  first  property  is  the 
important  one;  the  second  is  only  helpful  in  the  proof). 

Proof 

We  proceed  by  induction  on  events  happening  in  the  network.  Suppose 
both  properties  above  hold  up  to  time  t-.  Explicitly,  every  set  of 
nodes  S'  that  was  connected  at  any  time  t  <  t  was  also  synchronized  at 
that  time  and  every  node  j  that  was  connected  to  any  node  in  S'  at  any 
time  between  t  and  t-  had  R^  >_  level  of  S'  at  time  t.  The  events  that 
can  happen  at  time  t  and  affect  the  properties  of  the  Theorem  are:  (i)  a 
node  becomes  operational,  (ii)  a  link  fails,  (iii)  a  link  is  brought  up 
and  (iv)  M^  and/or  R^  is  changed.  We  proceed  to  check  each  of  the  possi¬ 
bilities.  First,  a  node  that  becomes  operational  will  not  connect  to  the 
rest  of  the  network  until  <ld>  holds,  so  that  this  case  reduces  to  (iii). 
Second,  if  the  set  was  synchronized  at  t-  and  a  link  fails,  it  will  remain 
synchronized  just  after  the  failure,  except  that  one  has  to  take  into  con¬ 
sideration  that  the  failure  causes  changes  of  M^  and  R^  at  the  adjacent 
nodes.  However  these  changes  are  treated  in  (iv) .  Observe  next  that 


ia 
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(iii)  can  happen  only  if  M^(t-)  ■  M^(t-)  «  NORMAL,  where  (i,i)  is  the 
link  under  consideration.  Suppose  first  that  i  and  t  do  not  belong  to 
two  disconnected  sets  at  time  t-.  Then,  since  the  set  under  considera¬ 
tion  is  synchronized  at  time  t-  by  the  induction  hypothesis,  it  follows 
that  R.  (t-)  »  R  (t-).  Hence  <2a">  and  <2a*">  do  not  apply  and  therefore 
the  only  relevant  variables  that  are  changed  are  M^,  R^,  M^,  (lines 
<2a,v>  and  on),  and  this  again  reduces  to  (iv).  Suppose  now  that  the 
new  link  (i,£)  does  connect  two  previously  disconnected  sets.  If  R^t-)  ■ 

*  R.(t-),  the  same  argument  as  before  applies.  If  for  example,  R. (t-)<  R.(t-), 
let  t'  be  the  time  just  after  execution  of  <2a">,  but  before  execution  of 
<2a,v>.  Recall  that  M. (t-)  ■  M#(t-)  ■  NORMAL  and  since  each  of  the  sets 
are  synchronized  at  t-,  we  have  Rr(t-)  ^R^t-)  for  all  r  connected  to  i  and 

Rr(t-)  >_  Rt(t_)  for  *11  r  connected  to  t,  with  equality  in  both  cases  for 

those  nodes  r  that  have  My(t-)  *  NORMAL.  Now,  from  t-  to  t'  all  nodes  r 

connected  to  i  raise  their  cycle  number  Rr  by  R^t-)  -  Rj(t-)  *nd  80  do  *H  •  ' 
messages  in  transient,  and  hence,  the  new  combined  set  remains  synchronized 
at  t'.  Now  the  transition  fTom  t'  to  t*  is  execution  of  <2a,v>  and  on,  and 
again  this  reduces  to  case  (iv),  which  is  treated  next.  Observe  that  Rj 

is  increased  if  and  only  if  Mj  is  WORK  or  becomes  WORK,  and  clearly  if  at 

t-  the  set  was  synchronized,  it  will  remain  so  at  t*.  Therefore  the  only 
situation  that  remains  to  be  treated  is  Mi  *■  NORMAL,  in  which  case  R^  is  not 

changed.  Let  R  be  the  value  of  R^  at  time  t-  (and  t*).  We  must  show  that 

for  all  nodes  j  c  S(,  where  S(  is  the  set  of  nodes  connected  to  i  at  time  t, 
we  have  Rj (t)  >,  R,  with  equality  if  Mj(t)  ■  NORMAL.  But  since  is  synchro¬ 
nized  at  t-  by  the  induction  hypothesis,  the  condition  Mj (t)  ■  NORMAL  requires 
Rj (t)  <_  R  for  all  j.  e  S t,  and  therefore  it  is  sufficient  to  show  that 

R, (t)  >  R  for  all  j  c  S  .  At  time  t,  node  i  performs  M.  ♦  NORMAL  and  let  P 

3  —  x  l  x 

be  the  set  of  nodes  k  for  which  d^(t)  <  |v|.  Nodes  k  e  P  certainly  have 

R^(t)  >,  R.  Now  take  any  node  a  such  that  a  e  S^,  but  a  i  P.  We  want  to 

show  that  R  (t)  >  R.  Observe  that  there  must  exist  a  node  B  tS*,  0  i  P 

9  ”  v 

such  that  0  is  at  time  t  a  neighbor  of  a  node  y  c  A  P  (see  Pig.  1) .  Since 

B  i  P,  node  0  was  disconnected  from  y  at  some  time  after  R^  ♦  R*  Let  S^_  be 


-  45 


/ 


the  connected  set  containing  y  at  tine  r-,  where  r  <  t  is  the  tine 

when  link  ($,y)  was  brought  up.  At  that  time  M  was  NORMAL  and  R  >  R 

y  y  — 

and  by  the  induction  hypothesis,  was  at  level  >_ R.  In  addition, 

by  the  second  assertion  of  the  Theorem  that  holds  at  time  t-  because 

of  the  induction  hypothesis,  the  fact  that  a  is  connected  at  time 

t  >  t  to  y  e  implies  R  (t)  >  level  of  at  t-  >  R. 

T  -  a  —  t  -  — 

This  completes  the  proof  of  case  (iv)  and  shows  that  the  connected 
set  remains  synchronized.  It  remains  to  prove  that  any  node  that  will 
become  connected  to  any  node  in  the  considered  connected  set  will 
have  at  that  time  counter  number  >_  R. 

At  time  t,  every  node  i  t  S£  has  R^  >,  R  and  R^  never  decreases.  Let 
t*  be  the  first  time  after  t  when  a  node  j'  becomes  connected  to  any 
node  i  e  St«  Since  until  that  time  all  connected  sets  are  synchronized, 
it  must  hold  that  R^,(t')  >_  R  by  the  same  argument  as  in  (iii)  above. 
Consequently,  after  t'  all  sets  remain  synchronized  and  the  same  argument 
shows  that  the  property  remains  true  for  all  future  connections,  completing 
the  proof  of  the  Theorem. 


c* 


Having  proved  the  main  properties  of  Version  B,  we  can  now  make  a  few 
observations  about  this  (non-distributed)  Version  that  will  allow  us  to 
introduce  an  equivalent  distributed  version. 
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Lean—  WE-B-1 

In  MHE  -  Version  B,  the  following  properties  hold: 

a)  if  <2a*>  holds  at  tine  t  and  nodes  i  and  l  are  connected  at  tine 
t-,  then  Rj(t-)  •  R^ft-)  *nd  hence  <2a"»  -  <2a’">  are  not  performed. 

b)  Theorem  WE-A-1  holds  for  version  B  as  well. 

c)  for  any  node  r,  Rr  is  nondecreasing  and,  unless  Rr  is  increased  by 
<2a">  or  <2a"'>,  it  has  increnents  of  +1 ;  if  a  node  i  sends  two 
consecutive  messages  on  a  given  link  with  counter  numbers  R',  R" 
respectively,  then  R"  >  R'  and  if  the  second  message  Is  not  related 
to  the  performing  of  <2a">  or  <2a'">,  then  R"  «  R'  or  R"  ■  R'  ♦  1. 

d)  if  node  i  receives  MSG  (t,R,  LIST)  and  R  >  R^,  then  LIST  ■  {£} 
and  this  message  was  sent  by  l  while  increasing  R^,  i.e.  either 
in  <2c>  or  <3b>,  but  not  in  <6b>. 

e)  the  values  of  R  and  R.^  are  not  necessary  for  the  algorithm;  we 
only  need  to  know  if  R  <  R^,  R  ■  R^  or  R  >  Rj. 

Proof 

a)  follows  Theorem  MHE-B-1.  Part  b)  can  be  proved  exactly  as  Theorem 
MHE-A-1.  For  c) ,  the  fact  that  Ry  and  the  counter  numbers  in  consecu¬ 
tive  messages  can  only  increase  is  obvious  from  <2a>,  <3b>,  <2a">,  <2a'">. 
The  rest  can  be  proved  by  a  common  induction  as  follows:  suppose  both 
properties  hold  until  time  t- .  The  counter  Rf  can  be  increased  at  time  t 
only  in  <2a>  and  <3b>  and  in  both  cases  only  by  «T.  Furthermore,  the 
message  with  R"  can  be  sent  by  i  only  in  <2a>  while  incrementing  R^  by  1, 
in  <3b>  while  incrementing  R^  to  R  which  is  exactly  R^  ♦  1  by  the  induction 
hypothesis,  or  in  <6b>  while  maintaining  R^  to  the  previous  level.  This 
proves  both  c)  and  d) .  Finally  e)  is  clear  from  the  algorithm. 
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Protocol  MHE  -  Version  C 

Variables  used  by  the  algorithm  at  node  i 

Sane  as  in  MH  and  in  addition  F.(JL)  as  in  Version  A  and: 

Qi(£)  whose  meaning  is  R^  -  XR^i)  where  XR^t)  is  the  largest 
counter  number  received  from  neighbor  l  in  version  B. 

Messages  sent  and  received  at  node  i 

MSG  (AR^,  LIST\)  -  sent,  where  AR^  has  the  meaning  of  the  difference 
between  R^^  in  Version  B  and  last  Rj  sent  on  this 
link. 

MSG  (1, AR,LIST)  -  received. 

Algorithm  for  node  i 

Definition:  "reset  variables"  has  the  same  meaning  as  in  version  A. 

1.  Node  i  becomes  operational  (same  Note  as  in  Version  A) 

la. -lb.  same  as  in  Versions  A  and  B; 

lc.  reset  variables;  Q^(t')  *■  0  for  all  t  for  which  F-^A')  ■  UP; 

M.  NORMAL. 

l 

ld.  if  there  is  an  operational  link  (i,l')  for  which  M  ,  »  NORMAL, 

proceed  as  in  <2a'>;  else  wait  until  this  happens  and  then 
proceed  as  in  <2a'>. 

2.  Adja  ent  link  (i,l)  fails 

2a.  Qj(i')  *■  Qj(l')  ♦  1  it'  +  1  for  which  F^(l')  •  UP;  proceed 

to  <2b>. 

2'.  Adjacent  link  (i,l)  is  operational  and  F^(l)  ■  (i)  ■  DOWN  and 

M.  ■  M ,■  NORMAL, 
i  1 


A 


-  48  - 


2a'. 

Qa(0  -  0;  Q.(l«)  ♦Q.d')  ♦  1  fv  for  which 

FjCf)  -  UP; 

2b. 

F^(t)  ■*-  DOWN  or  UP  according  to  new  status; 

2c. 

reset  variables;  WORK;  d*  ♦  0;  2^  ♦  0; 

LISTi  -  (i); 

send  MSG(l,LI£n\)  to  all  ra  for  which  F^(m) 

*  UP. 

3. 

For  MSG (1,AR, LIST) 

3'. 

if  AR  <  Qa  (t) ,  then:  Q.  (1)  «-  Qi(t)  -  AR. 

3a. 

else 

3b. 

if  AR  >  QjCt)  (note:  i.e.  AR  -  1,  Q^U)  ■ 

0) ,  then : 

cr 

QiU')  *►  Q^(A')  +1  iv  for  which  F.(t*) 
same  as  <2c> ; 

•  UP; 

3b:. 

QjU)  ♦  0; 

3c. 

if  M.  »  WORK,  then 

4.-7 

.  same  as  in  <4>  -  <7>  in  MM,  except  that  MSG  has  format 

MSG(0,LISTi). 

We  have  numbered  the  lines  in  Version  C  to  correspond  to  the  appropriate 
lines  in  Version  B.  The  note  appearing  in  <3b>  holds  because  of  Lemma 

MUE-B-1  c).  Observe  that  <2a'>  in  Version  C  is  equivalent  to  <2a">, 

<2a'">  of  Version  B.  This  is  because  if  i  and  Jt  are  connected  at  time  t-, 

where  t  is  the  time  of  the  event  occurence,  then  in  version  B,  Rj(t-)  ■  R^Ct-) 

from  Lemma  MHF.-B-1  a)  and  <2a’>  in  version  C  says  exactly  the  same  thing. 

If,  on  the  other  hand,  i  and  *,  are  disconnected  at  time  t-,  the  effect  of 
bringing  R^(t-)  and  R^(t-)  to  the  same  level  while  raising  accordingly 
all  appropriate  counter  numbers  is  equivalent  to  <2a'>  of  Version  C.  This 
implies  that  Versions  B  and  C  are  equivalent.  Furthermore,  Version  C  is 
distributed  and  the  counter  numbers  are  bounded  as  shown  below. 
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Theorem  M11E-C-I 

a)  The  counter  numbers  AR  in  MSG  take  values  0  and  1  only. 

b)  For  every  i  and  t,  the  variable  Q^t)  { E (  where  [ E (  is  the 
number  of  links  in  the  network. 


Proof 

All  messages  sent  in  the  algorithm  have  AR  ■  0  or  1  and  this  proves 
part  a).  To  see  that  b)  holds,  observe  that  Q^U)  can  increase  only 
if,  while  link  (i,t)  is  operating,  node  i  keeps  sending  MSG(l,LISTi) 
to  l,  but  l  does  not  respond.  After  the  cycle  corresponding  to  the 
first  of  these  messages  covers  the  entire  netowrk  (or  is  covered  by 
another  cycle),  no  link  can  be  brought  up,  since  lack  of  response  from 
1  does  not  allow  any  other  node  k  to  return  to  »  NORMAL.  Therefore 
the  worst  case  is  when  all  links  fail  one  after  the  other  in  such  a 
way  that  each  increments  (t)  and  the  total  number  can  be  no  higher 
than  |e|  -  1  (for  all  links  except  (i,l))  plus  1  for  the  case  when 
(i,A)  just  came  up. 
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8.  Conclusions 

In  this  paper  we  have  addressed  the  problem  of  providing  formal 
description  and  validation  to  a  number  of  Distributed  Network 
Protocols.  After  introducing  two  simple  basic  protocols  in  Sec.  3 
that  form  building  blocks  and  unifying  framework  for  the  more  complex 
ones,  we  introduce  three  classes  of  DNP's  -  connectivity  test,  minimum- 
hop  paths  and  path -updating.  For  each  we  provide  the  algorithm  for 

the  nodes  participating  in  the  protocol  and  formal  proof  of  its  valida¬ 
tion,  extensively  using  the  properties  of  the  basic  protocol  on  which 
it  is  based.  Finally,  we  present  a  unified  way  to  extend  those  proto¬ 
cols  to  the  case  of  changes  in  the  network  topology. 


Footnotes 


1.  The  statement  "For..."  means  "the  actions  taken  by  the  processor 
when  receiving  ..." 

2.  The  notation  <•>  will  always  denote  the  corresponding  line  in  the 
Algorithm  under  consideration. 

3.  We  use  superscript  s  throughout  the  description  of  the  present 
protocol  to  explicitly  indicate  the  node  that  propagates  the 
information. 

4.  We  write  the  time  in  parentheses  to  indicate  the  value  of  a  para¬ 
meter  at  a  specific  time.  Also,  t-  and  t*  denote  the  time  just 
before/after  time  t. 
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